Party on ! A New

نویسندگان

  • Carolin Strobl
  • Torsten Hothorn
  • Achim Zeileis
چکیده

Random forests are one of the most popular statistical learning algorithms, and a variety of methods for fitting random forests and related recursive partitioning approaches is available in R. This paper points out two important features of the random forest implementation cforest available in the party package: The resulting forests are unbiased and thus preferable to the randomForest implementation available in randomForest if predictor variables are of different types. Moreover, a conditional permutation importance measure has recently been added to the party package, which can help evaluate the importance of correlated predictor variables. The rationale of this new measure is illustrated and hands-on advice is given for the usage of recursive partitioning tools in R. Recursive partitioning methods are amongst the most popular and widely used statistical learning tools for nonparametric regression and classification. Random forests in particular, which can deal with large numbers of predictor variables even in the presence of complex interactions, are being applied successfully in many scientific fields (see, e.g., Lunetta et al., 2004; Strobl et al., 2009, and the references therein for applications in genetics and social sciences). Thus, it is not surprising that there is a variety of recursive partitioning tools available in R (see http://CRAN.R-project.org/view= MachineLearning for an overview). The scope of recursive partitioning methods in R ranges from the standard classification and regression trees available in rpart (Therneau et al., 2008) to the reference implementation of random forests (Breiman, 2001) available in randomForest (Liaw and Wiener, 2002, 2008). Both methods are popular in applied research, and several extensions and refinements have been suggested in the statistical literature in recent years. One particularly important improvement was the introduction of unbiased tree algorithms, which overcome the major weak spot of the classical approaches available in rpart and randomForest: variable-selection bias. The term variable-selection bias refers to the fact that in standard tree algorithms variable selection is biased in favor of variables offering many potential cut-points, so that variables with many categories and continuous variables are artificially preferred (see, e.g, Kim and Loh, 2001; Shih, 2002; Hothorn et al., 2006; Strobl et al., 2007a, for details). To overcome this weakness of the early tree algorithms, new algorithms have been developed that do not artificially favor splits in variables with many categories or continuous variables. In R such an unbiased tree algorithm is available in the ctree function for conditional inference trees in the party package (Hothorn et al., 2006). The package also provides a random forest implementation cforest based on unbiased trees, which enables learning unbiased forests (Strobl et al., 2007b). Unbiased variable selection is the key to reliable prediction and interpretability in both individual trees and forests. However, while a single tree’s interpretation is straightforward, in random forests an extra effort is necessary to assess the importance of each predictor in the complex ensemble of trees. This issue is typically addressed by means of variable-importance measures such as Gini importance and the “mean decrease in accuracy” or “permutation” importance, available in randomForest in the importance() function (with type = 2 and type = 1, respectively). Similarly, a permutationimportance measure for cforest is available via varimp() in party. Unfortunately, variable-importance measures in random forests are subject to the same bias in favor of variables with many categories and continuous variables that affects variable selection in single trees, and also to a new source of bias induced by the resampling scheme (Strobl et al., 2007b). Both problems can be addressed in party to guarantee unbiased variable selection and variable importance for predictor variables of different types. Even though this refined approach can provide reliable variable-importance measures in many applications, the original permutation importance can be misleading in the case of correlated predictors. Therefore, Strobl et al. (2008) suggested a solution for this problem in the form of a new, conditional permutation-importance measure. Starting from version 0.9-994, this new measure is available in the party package. The rationale and usage of this new measure is outlined in the following sections and illustrated by means of a toy example. Random forest variable-importance

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An artificial intelligence model based on LS-SVM for third-party logistics provider ‎selection

The use of third-party logistics (3PL) providers is regarded as new strategy in logistics management. The relationships by considering 3PL are sometimes more complicated than any classical logistics supplier relationships. These relationships have taken into account as a well-known way to highlight organizations' flexibilities to regard rapidly uncertain market conditions, follow core competenc...

متن کامل

آسیب‌ها و چالش‌های نظام تک حزبی برای فدرالیسم (مطالعه موردی : عراق تحت حاکمیت حزب بعث)

The existence of single-party systems in the world today, on the one hand, and the fact that federal governments in some parts of the world, meanwhile, make the two cases more contemplative. On the other hand, with the advent of both systems in Iraqi society, it is questioned how the interaction of a single-party system with federalism and what problems and challenges have arisen in this regard...

متن کامل

Political Ideology and Stigmatizing Attitudes Toward Depression: The Swedish Case

Background Stigmatizing attitudes toward persons with mental disorders is a well-established and global phenomenon often leading to discrimination and social exclusion. Although previous research in the United States showed that conservative ideology has been related to stigmatizing attitudes toward mental disorders, there is reason to believe that this mechanism plays a different role in...

متن کامل

Obstacles to the Consolidation of Democracy in Turkey (2002 –present) "The vague political nature, the organized structure of the party, and the authoritarian leadership"

Abstract The process of democratization of Turkey over more than a century has been accompanied by various barriers, such as military intervention, inefficient and passive opposition, undemocratic performance and sponsor of the religious-political tariqa, ethnic and religious rifts, authoritarian government leaders, political parties’ poor and sometimes biased performance, contradictory intern...

متن کامل

Feminism and Abortion in the United States’ Party Politics

Abstract The feminist movement in the United States like other countries has tried to establish equality for women. From the first attempts to gain constitutional right for vote, up to the current radical demands, feminists have struggled to make changes in the U.S. party politics and obtain their rights within the parties. One of the important issues in which women played a key role in party ...

متن کامل

Karen Horneys theory and Ehsan Tabaris political psychology

Purpose: Ehsan Tabari (1295-1368) was the unmistakable theoretician of the Tudeh party for more than 42 years and was also arrested and imprisoned after the political developments of the 1960s and the arrest of the leaders of this party. Shortly after his arrest in a television program, he said he has turned to Islam by reconsidering his past thoughts. Methodology: This event transformed Tabari...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010